--- title: IceVision Bboxes - Real Data keywords: fastai sidebar: home_sidebar nb_path: "nbs/iv_bbox_real.ipynb" ---
{% raw %}
{% endraw %}

This is a mashup of IceVision's "Custom Parser" example and their "Getting Started (Object Detection)" notebooks, to analyze SPNet Real dataset, for which I generated bounding boxes. -- S.H. Hawley, July 1, 2021

Installing IceVision and IceData

If on Colab run the following cell, else check the installation instructions

{% raw %}
#try:
#    !wget https://raw.githubusercontent.com/airctic/icevision/master/install_colab.sh
#    !chmod +x install_colab.sh && ./install_colab.sh
#except:
#    print("Ignore the error messages and just keep going")
{% endraw %} {% raw %}
 
{% endraw %} {% raw %}
import torch, re 
tv, cv = torch.__version__, torch.version.cuda
tv = re.sub('\+cu.*','',tv)
TORCH_VERSION = 'torch'+tv[0:-1]+'0'
CUDA_VERSION = 'cu'+cv.replace('.','')

print(f"TORCH_VERSION={TORCH_VERSION}; CUDA_VERSION={CUDA_VERSION}")
print(f"CUDA available = {torch.cuda.is_available()}, Device count = {torch.cuda.device_count()}, Current device = {torch.cuda.current_device()}")
print(f"Device name = {torch.cuda.get_device_name()}")
print("hostname:")
!hostname
TORCH_VERSION=torch1.8.0; CUDA_VERSION=cu102
CUDA available = True, Device count = 1, Current device = 0
Device name = TITAN X (Pascal)
hostname:
lecun
{% endraw %} {% raw %}
#!pip install -qq mmcv-full=="1.3.8" -f https://download.openmmlab.com/mmcv/dist/{CUDA_VERSION}/{TORCH_VERSION}/index.html --upgrade
#!pip install mmdet -qq
{% endraw %}

Imports

As always, let's import everything from icevision. Additionally, we will also need pandas (you might need to install it with pip install pandas).

{% raw %}
from icevision.all import *
import pandas as pd
INFO     - The mmdet config folder already exists. No need to downloaded it. Path : /home/shawley/.icevision/mmdetection_configs/mmdetection_configs-2.10.0/configs | icevision.models.mmdet.download_configs:download_mmdet_configs:17
{% endraw %}

Download dataset

We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here

{% raw %}
#data_dir = icedata.load_data(data_url, 'chess_sample') / 'chess_sample-master'

# SPNET Real Dataset link (currently proprietary, thus link may not work)
#data_url = "https://hedges.belmont.edu/~shawley/spnet_sample-master.zip"
#data_dir = icedata.load_data(data_url, 'spnet_sample') / 'spnet_sample-master' 

# public espiownage cyclegan dataset:
#data_url = 'https://hedges.belmont.edu/~shawley/espiownage-cyclegan.tgz'
#data_dir = icedata.load_data(data_url, 'espiownage-cyclegan') / 'espiownage-cyclegan'

# local data already there:
from pathlib import Path
data_dir = Path('/home/shawley/datasets/espiownage-cleaner')  # real data is local and private
{% endraw %}

Understand the data format

In this task we were given a .csv file with annotations, let's take a look at that.

!!! danger "Important"
Replace source with your own path for the dataset directory.

{% raw %}
df = pd.read_csv(data_dir / "bboxes/annotations.csv")
df.head()
filename width height label xmin ymin xmax ymax
0 06240907_proc_00254.png 512 384 1 31 135 184 290
1 06240907_proc_00256.png 512 384 0 65 153 168 270
2 06240907_proc_00270.png 512 384 1 45 149 164 280
3 06240907_proc_00281.png 512 384 10 0 111 185 340
4 06240907_proc_00281.png 512 384 1 254 134 353 215
{% endraw %}

At first glance, we can make the following assumptions:

  • Multiple rows with the same filename, width, height
  • A label for each row
  • A bbox [xmin, ymin, xmax, ymax] for each row

Once we know what our data provides we can create our custom Parser.

{% raw %}
set(np.array(df['label']).flatten())
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
{% endraw %} {% raw %}
#df['label'] = ["Object"]*len(df)#  "_"+df['label'].apply(str)   # force label to be string-like
{% endraw %} {% raw %}
df['label'] /= 2
#df.head()
df['label'] = df['label'].apply(int) 
print(set(np.array(df['label']).flatten()))
df['label'] = "_"+df['label'].apply(str)+"_"
{0, 1, 2, 3, 4, 5}
{% endraw %} {% raw %}
df.head()
filename width height label xmin ymin xmax ymax
0 06240907_proc_00254.png 512 384 _0_ 31 135 184 290
1 06240907_proc_00256.png 512 384 _0_ 65 153 168 270
2 06240907_proc_00270.png 512 384 _0_ 45 149 164 280
3 06240907_proc_00281.png 512 384 _5_ 0 111 185 340
4 06240907_proc_00281.png 512 384 _0_ 254 134 353 215
{% endraw %} {% raw %}
df['label'] = 'AN'  # antinode
df.head()
filename width height label xmin ymin xmax ymax
0 06240907_proc_00254.png 512 384 AN 31 135 184 290
1 06240907_proc_00256.png 512 384 AN 65 153 168 270
2 06240907_proc_00270.png 512 384 AN 45 149 164 280
3 06240907_proc_00281.png 512 384 AN 0 111 185 340
4 06240907_proc_00281.png 512 384 AN 254 134 353 215
{% endraw %}

Create the Parser

The first step is to create a template record for our specific type of dataset, in this case we're doing standard object detection:

{% raw %}
template_record = ObjectDetectionRecord()
{% endraw %}

Now use the method generate_template that will print out all the necessary steps we have to implement.

{% raw %}
Parser.generate_template(template_record)
class MyParser(Parser):
    def __init__(self, template_record):
        super().__init__(template_record=template_record)
    def __iter__(self) -> Any:
    def __len__(self) -> int:
    def record_id(self, o: Any) -> Hashable:
    def parse_fields(self, o: Any, record: BaseRecord, is_new: bool):
        record.set_img_size(<ImgSize>)
        record.set_filepath(<Union[str, Path]>)
        record.detection.add_bboxes(<Sequence[BBox]>)
        record.detection.set_class_map(<ClassMap>)
        record.detection.add_labels(<Sequence[Hashable]>)
{% endraw %}

We can copy the template and use it as our starting point. Let's go over each of the methods we have to define:

  • __init__: What happens here is completely up to you, normally we have to pass some reference to our data, data_dir in our case.

  • __iter__: This tells our parser how to iterate over our data, each item returned here will be passed to parse_fields as o. In our case we call df.itertuples to iterate over all df rows.

  • __len__: How many items will be iterating over.

  • imageid: Should return a Hashable (int, str, etc). In our case we want all the dataset items that have the same filename to be unified in the same record.

  • parse_fields: Here is where the attributes of the record are collected, the template will suggest what methods we need to call on the record and what parameters it expects. The parameter o it receives is the item returned by __iter__.

!!! danger "Important"
Be sure to pass the correct type on all record methods!

{% raw %}
class BBoxParser(Parser):
    def __init__(self, template_record, data_dir):
        super().__init__(template_record=template_record)
        
        self.data_dir = data_dir
        self.df = pd.read_csv(data_dir / "bboxes/annotations.csv")
        #self.df['label'] /= 2
        #self.df['label'] = self.df['label'].apply(int) 
        #self.df['label'] = "_"+self.df['label'].apply(str)+"_"
        self.df['label'] = 'AN'  # make them all the same object
        self.class_map = ClassMap(list(self.df['label'].unique()))
        
    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o
        
    def __len__(self) -> int:
        return len(self.df)
        
    def record_id(self, o) -> Hashable:
        return o.filename
        
    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(self.data_dir / 'images' / o.filename)
            record.set_img_size(ImgSize(width=o.width, height=o.height))
            record.detection.set_class_map(self.class_map)
        
        record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])
        record.detection.add_labels([o.label])
{% endraw %}

Let's randomly split the data and parser with Parser.parse:

{% raw %}
parser = BBoxParser(template_record, data_dir)
{% endraw %} {% raw %}
train_records, valid_records = parser.parse()
INFO     - Autofixing records | icevision.parsers.parser:parse:136
{% endraw %}

Let's take a look at one record:

{% raw %}
show_record(train_records[5], display_label=False, figsize=(14, 10))
{% endraw %} {% raw %}
train_records[0]
BaseRecord

common: 
	- Image size ImgSize(width=512, height=384)
	- Record ID: 579
	- Filepath: /home/shawley/datasets/espiownage-cleaner/images/06240907_proc_01247.png
	- Img: None
detection: 
	- BBoxes: [<BBox (xmin:219, ymin:127, xmax:364, ymax:264)>]
	- Class Map: <ClassMap: {'background': 0, 'AN': 1}>
	- Labels: [1]
{% endraw %}

Moving On...

Following the Getting Started "refrigerator" notebook...

{% raw %}
# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384  
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)
{% endraw %} {% raw %}
samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)
{% endraw %} {% raw %}
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)
{% endraw %} {% raw %}
selection = 0


extra_args = {}

if selection == 0:
  model_type = models.mmdet.retinanet
  backbone = model_type.backbones.resnet50_fpn_1x

elif selection == 1:
  # The Retinanet model is also implemented in the torchvision library
  model_type = models.torchvision.retinanet
  backbone = model_type.backbones.resnet50_fpn

elif selection == 2:
  model_type = models.ross.efficientdet
  backbone = model_type.backbones.tf_lite0
  # The efficientdet model requires an img_size parameter
  extra_args['img_size'] = image_size

elif selection == 3:
  model_type = models.ultralytics.yolov5
  backbone = model_type.backbones.small
  # The yolov5 model requires an img_size parameter
  extra_args['img_size'] = image_size

model_type, backbone, extra_args
(<module 'icevision.models.mmdet.models.retinanet' from '/home/shawley/envs/icevision/lib/python3.8/site-packages/icevision/models/mmdet/models/retinanet/__init__.py'>,
 <icevision.models.mmdet.models.retinanet.backbones.resnet_fpn.MMDetRetinanetBackboneConfig at 0x7ff60ebf7790>,
 {})
{% endraw %} {% raw %}
model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args) 
/home/shawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/builder.py:16: UserWarning: ``build_anchor_generator`` would be deprecated soon, please use ``build_prior_generator`` 
  warnings.warn(
Use load_from_local loader
The model and loaded state dict do not match exactly

size mismatch for bbox_head.retina_cls.weight: copying a param with shape torch.Size([720, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([9, 256, 3, 3]).
size mismatch for bbox_head.retina_cls.bias: copying a param with shape torch.Size([720]) from checkpoint, the shape in current model is torch.Size([9]).
{% endraw %} {% raw %}
train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)
{% endraw %} {% raw %}
model_type.show_batch(first(valid_dl), ncols=4)
{% endraw %} {% raw %}
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
{% endraw %} {% raw %}
learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)
{% endraw %} {% raw %}
learn.lr_find(end_lr=0.005)

# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)
/home/shawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:324: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/home/shawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:360: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(
SuggestedLRs(lr_min=5.743491929024458e-05, lr_steep=5.313768269843422e-05)
{% endraw %} {% raw %}
learn.fine_tune(60, 1e-4, freeze_epochs=2)
epoch train_loss valid_loss COCOMetric time
0 0.658651 0.498617 0.464306 00:52
1 0.469725 0.400209 0.548968 00:48
epoch train_loss valid_loss COCOMetric time
0 0.390771 0.355131 0.582842 00:55
1 0.368687 0.344073 0.601226 00:54
2 0.363694 0.330220 0.603064 00:54
3 0.356962 0.326642 0.618339 00:54
4 0.355353 0.325830 0.610424 00:55
5 0.336918 0.315172 0.617550 00:54
6 0.329798 0.306824 0.633927 00:54
7 0.326428 0.299375 0.641579 00:54
8 0.324805 0.297577 0.639040 00:54
9 0.313980 0.298823 0.633870 00:54
10 0.310885 0.296335 0.626267 00:54
11 0.323794 0.295174 0.640600 00:54
12 0.315642 0.306940 0.620459 00:54
13 0.306164 0.292863 0.639179 00:54
14 0.307077 0.291687 0.631341 00:54
15 0.302419 0.292415 0.637071 00:54
16 0.307540 0.289878 0.632074 00:54
17 0.300110 0.285916 0.641625 00:54
18 0.292739 0.301792 0.640323 00:54
19 0.296331 0.288073 0.632265 00:54
20 0.288993 0.295582 0.613878 00:54
21 0.278794 0.291908 0.648801 00:54
22 0.283681 0.280127 0.648150 00:54
23 0.274897 0.285824 0.642898 00:54
24 0.286412 0.285834 0.644290 00:54
25 0.285934 0.279038 0.648806 00:54
26 0.271323 0.288621 0.638029 00:54
27 0.269954 0.281612 0.643152 00:54
28 0.266680 0.281168 0.646508 00:54
29 0.261132 0.287498 0.637203 00:54
30 0.261248 0.277343 0.649349 00:54
31 0.259545 0.282798 0.646609 00:54
32 0.253974 0.281841 0.649398 00:54
33 0.258075 0.286692 0.642202 00:54
34 0.259540 0.286558 0.639549 00:54
35 0.252741 0.287285 0.638743 00:54
36 0.244557 0.287405 0.644277 00:54
37 0.241395 0.285824 0.646684 00:54
38 0.252069 0.289285 0.643381 00:54
39 0.249674 0.288817 0.640122 00:54
40 0.232845 0.291657 0.643916 00:54
41 0.239928 0.291267 0.642035 00:54
42 0.236285 0.292477 0.635372 00:54
43 0.234847 0.289061 0.641251 00:54
44 0.235862 0.289175 0.645189 00:54
45 0.233823 0.286583 0.642454 00:54
46 0.231075 0.287693 0.643910 00:54
47 0.234239 0.293840 0.634784 00:54
48 0.230132 0.297524 0.633596 00:54
49 0.227520 0.289822 0.642310 00:54
50 0.225101 0.295107 0.636727 00:54
51 0.223715 0.290259 0.640763 00:54
52 0.221756 0.297446 0.632370 00:54
53 0.222278 0.294731 0.638411 00:54
54 0.222368 0.297183 0.633858 00:54
55 0.224260 0.294142 0.639211 00:54
56 0.222537 0.294346 0.639739 00:54
57 0.218411 0.294109 0.639100 00:54
58 0.207974 0.294933 0.638849 00:54
59 0.219063 0.294970 0.638602 00:54
{% endraw %} {% raw %}
model_type.show_results(model, valid_ds, detection_threshold=.5)
{% endraw %} {% raw %}
learn.save('iv_bbox_real')
{% endraw %}

Inference

{% raw %}
learn.load('iv_bbox_real')
<fastai.learner.Learner at 0x7ff6387fe130>
{% endraw %} {% raw %}
preds = model_type.predict(model, valid_ds, keep_images=True)
/home/shawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:324: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/home/shawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:360: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(
{% endraw %} {% raw %}
show_preds(preds=preds[0:10])
{% endraw %} {% raw %}
len(train_ds), len(valid_ds), len(preds)
(1564, 391, 391)
{% endraw %}

let's try to figure out how to get what we want from these predictions. hmmm

{% raw %}
preds[0].pred
BaseRecord

common: 
	- Img: 384x384x3 <np.ndarray> Image
	- Record ID: 1502
	- Image size ImgSize(width=384, height=384)
detection: 
	- Class Map: <ClassMap: {'background': 0, 'AN': 1}>
	- Labels: [1]
	- Scores: [0.8102817]
	- BBoxes: [<BBox (xmin:101.09046936035156, ymin:94.21710205078125, xmax:217.6924285888672, ymax:215.40011596679688)>]
{% endraw %} {% raw %}
preds[0].pred.detection.scores
array([0.8102817], dtype=float32)
{% endraw %} {% raw %}
preds[0].pred.detection.bboxes
[<BBox (xmin:101.09046936035156, ymin:94.21710205078125, xmax:217.6924285888672, ymax:215.40011596679688)>]
{% endraw %} {% raw %}
preds[0].pred.detection.bboxes[0].xmin

def get_bblist(pred):
    my_bblist = []
    bblist = pred.pred.detection.bboxes
    for i in range(len(bblist)):
        my_bblist.append([bblist[i].xmin, bblist[i].ymin, bblist[i].xmax, bblist[i].ymax])
    return my_bblist

get_bblist(preds[0])      
[[101.09047, 94.2171, 217.69243, 215.40012]]
{% endraw %} {% raw %}
preds[3].pred
BaseRecord

common: 
	- Img: 384x384x3 <np.ndarray> Image
	- Image size ImgSize(width=384, height=384)
	- Record ID: 1208
detection: 
	- Scores: []
	- Class Map: <ClassMap: {'background': 0, 'AN': 1}>
	- Labels: []
	- BBoxes: []
{% endraw %} {% raw %}
results = []
for i in range(len(preds)):
    if (len(preds[i].pred.detection.scores) == 0): continue   # sometimes you get a zero box/prediction. ??
    #print(f"i = {i}, file = {str(Path(valid_ds[i].common.filepath).stem)+'.csv'}, bboxes = {get_bblist(preds[i])}, scores={preds[i].pred.detection.scores}\n")
    worst_score = np.min(np.array(preds[i].pred.detection.scores))
    line_list = [str(Path(valid_ds[i].common.filepath).stem)+'.csv', get_bblist(preds[i]), preds[i].pred.detection.scores, worst_score, i]
    results.append(line_list)
    
# store as pandas dataframe
res_df = pd.DataFrame(results, columns=['filename', 'bblist','scores','worst_score','i'])
res_df = res_df.sort_values('worst_score')  # order by worst score as a "top losses" kind of thing
res_df.head() # take a look
filename bblist scores worst_score i
179 06240907_proc_01377.csv [[205.5557, 56.10971, 263.07602, 119.1846], [166.76402, 145.9241, 267.64172, 248.77289]] [0.709456, 0.5010721] 0.501072 243
277 06240907_proc_01692.csv [[166.40619, 143.38078, 272.57953, 260.54666]] [0.50184834] 0.501848 386
142 06241902_proc_00653.csv [[107.82741, 100.80711, 209.56075, 227.92378]] [0.50243336] 0.502433 194
259 06241902_proc_01364.csv [[107.8574, 100.29289, 219.56967, 216.9384], [0.0, 89.31752, 65.50232, 217.99678]] [0.8903083, 0.50315773] 0.503158 361
254 06241902_proc_01251.csv [[111.39845, 103.24177, 215.54222, 224.07587]] [0.50415325] 0.504153 353
{% endraw %} {% raw %}
res_df.to_csv('bboxes_top_losses_real.csv', index=False)
{% endraw %}